163 research outputs found

    Statistical Properties of Convex Clustering

    Full text link
    In this manuscript, we study the statistical properties of convex clustering. We establish that convex clustering is closely related to single linkage hierarchical clustering and kk-means clustering. In addition, we derive the range of tuning parameter for convex clustering that yields a non-trivial solution. We also provide an unbiased estimate of the degrees of freedom, and provide a finite sample bound for the prediction error for convex clustering. We compare convex clustering to some traditional clustering methods in simulation studies.Comment: 20 pages, 5 figure

    Sure Screening for Gaussian Graphical Models

    Full text link
    We propose {graphical sure screening}, or GRASS, a very simple and computationally-efficient screening procedure for recovering the structure of a Gaussian graphical model in the high-dimensional setting. The GRASS estimate of the conditional dependence graph is obtained by thresholding the elements of the sample covariance matrix. The proposed approach possesses the sure screening property: with very high probability, the GRASS estimated edge set contains the true edge set. Furthermore, with high probability, the size of the estimated edge set is controlled. We provide a choice of threshold for GRASS that can control the expected false positive rate. We illustrate the performance of GRASS in a simulation study and on a gene expression data set, and show that in practice it performs quite competitively with more complex and computationally-demanding techniques for graph estimation

    Convex Modeling of Interactions with Strong Heredity

    Full text link
    We consider the task of fitting a regression model involving interactions among a potentially large set of covariates, in which we wish to enforce strong heredity. We propose FAMILY, a very general framework for this task. Our proposal is a generalization of several existing methods, such as VANISH [Radchenko and James, 2010], hierNet [Bien et al., 2013], the all-pairs lasso, and the lasso using only main effects. It can be formulated as the solution to a convex optimization problem, which we solve using an efficient alternating directions method of multipliers (ADMM) algorithm. This algorithm has guaranteed convergence to the global optimum, can be easily specialized to any convex penalty function of interest, and allows for a straightforward extension to the setting of generalized linear models. We derive an unbiased estimator of the degrees of freedom of FAMILY, and explore its performance in a simulation study and on an HIV sequence data set.Comment: Final version accepted for publication in JCG

    Selection and Estimation for Mixed Graphical Models

    Full text link
    We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different parametric form. In particular, we assume that each node's conditional distribution is in the exponential family. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies
    • …
    corecore